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Abstract 

In this paper, we consider multiple cache-enabled clients connected to multiple servers through an intermediate network. 
We design several topology-aware coding strategies for such networks. Based on topology richness of the intermediate network, 
and types of coding operations at internal nodes, we define three classes of networks, namely, dedicated, flexible, and linear 
networks. For each class, we propose an achievable coding scheme, analyze its coding delay, and also, compare it with an 
information theoretic lower bound. For flexible networks, we show that our scheme is order-optimal in terms of coding delay 
and, interestingly, the optimal memory-delay curve is achieved in certain regimes. In general, our results suggest that, in case of 
networks with multiple servers, type of network topology can be exploited to reduce service delay. 


I. Introduction 

Unprecedented growth in transmit data volumes throughout the networks in recent years demands more efficient use of 
storage devices while providing high quality of service (QoS) to the users. Currently, large files are stored on servers and 
users’ requests are stored in queues waiting to get service from them. Naturally, one approach to reduce congestion in such 
networks is to increase the service rate of such servers. However, this will put additional burden on such nodes. As the cost of 
storage devices has decreased over the years, another viable option is to provide geographical content replication in the network 
through use of the so-called low-capacity caching nodes. The idea of using such nodes for data replication and providing easier 
local access to data is already covered in the literature (see for example ID-0). 

Recently, in their seminal work, Maddah-Ali and Niesen considered a single server network and have shown that through 
a two-phase cache placement and content delivery strategy, server load can be reduced inversely proportional to the total size 
of cache introduced in the network. In fact, in the cache placement phase, contents are stored on caches without knowing the 
actual demands of the users and in content delivery phase the server transmits packets to fulfill the demands. The fact that 
such global caching gain can be achieved in such network is surprising as the demands are not known apriori at the cache 
placement phase. 

Maddah-Ali and Niesen’s cache placement strategy is based on shattering each file into many pieces and only distributing 
them throughout the caching nodes without replication. It should be noted that such approach is in contrast to the conventional 
local cache placement strategies where a file or a single piece of it is replicated in caches. The astounding feature of their 
strategy is that transmission of a single packet at content delivery phase can then simultaneously serve several users. Imagine 
that two pieces of two files are stored at two different caches and each of them requires the piece available at the other. A 
single packet containing the sum of two packets can be sent to fulfill both users’ demands. They have shown that their strategy 
is 12-approximation of the optimal strategy. 

The network considered in 0 is a simple broadcast network where a packet transmitted by the server arrives unaltered at 
all users. A fundamental problem is to see how network topology affects the optimal coding strategy through both placement 
and delivery phases. 

One of the simplest topologies is the tree network. In (8), Maddah-Ali and Niesen proved that their original strategy can 
be used directly for such a network and what is needed to achieve 12-approximation of the optimal strategy is a simple 
topology-aware routing strategy at the internal nodes; An internal node routes a packet on its output port if the packet is useful 
for at least one of the port’s children. 

While the topology-aware routing scheme for tree networks is shown to be an order-optimal solution, real-world topologies 
are much more sophisticated than the simple tree structure. In this paper, we characterize the effect of network topologies on 
code design and performance analysis of coded caching in a more general setup. In particular, we investigate a multi-server 
network topology where a set of servers are connected to the clients through an error-free and delay-free intermediate network 
of nodes (see Fig. Q}. We assume that each node in the intermediate network can perform any causal processing on its input 
data, to generate its outgoing data. This can consist of simple routing or more sophisticated network coding schemes. 

The objective considered in (7[ is minimizing the traffic load imposed to the single server. However, in general, other 
objectives may be of higher importance when designing network operation strategy. One such key criterion is the service delay 
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Fig. 1. Network Model. 


of the network which is specially critical in content delivery networks (see eg. m-my We define the service delay of the 
network as the total time required to serve any given set of the clients’ requests for a specific strategy. We distinguish between 
two types of delay. Network Delay T/v and Coding Delay Tc , where the total service delay, T, is given by T = 7’v + Te¬ 
lia be more precise, 7’v is the time it takes for packets to be routed through the network and arrive at their requesting nodes. 
Naturally, Tjv mainly captures the links and queues delays in the network which are intrinsic characteristics of the network. 
On the other hand, Tc captures the transmission block length required to serve all the users for a specific coding strategy. In 
this paper, we focus on the coding delay and design strategies to minimize such delays. 

We consider three classes of networks: 1- dedicated networks, 2- flexible networks, and 3- linear networks. These networks 
are characterized based on the richness of their internal connections, as shown in Fig. [2] In each class, an important network 
topology aspect is the number of servers connected to the network, and their points of contact. In dedicated networks, we can 
dedicate each server to serve a fixed subset of clients, where each server can send a common message to its corresponding 
subset, interference-free from other servers. Although in dedicated networks the assignment of clients to the servers is fixed, in 
flexible networks the network topology is rich-enough to let us adapt these assignments during network operation. Finally, in 
linear networks we assume random linear network coding operations at the internal nodes. Consequently, in linear networks, 
the network input-output relation is characterized by a random matrix. As we show in this paper, in order to minimize the 
coding delay, designing the coding strategy for each class should carefully utilize the flexibility of that class. As will be shown 
subsequently, there exist coding strategies outperforming that of |8] for all of the three classes of networks. Interestingly, we 
obtain an order optimal solution for the flexible networks. 

Finally, let us review some notations used in this paper. We use lower case bold-face symbols to represent vectors, and 
upper case bold-face symbols to represent matrices. For any matrix A, A f denotes the transpose of A and for any vector a, 
a shows that the condition a.a = 0 is satisfied. For any two sets ,S'i and S 2 , the set ,3'j \,S '2 consists of those elements of Si 
not present in S 2 . Also we define [K] = {1,..., A'} and N to be the set of integer numbers. Moreover, F 9 shows a finite field 
with q elements, and F“ xb denotes the set of all a-by-b matrices whose elements belong to F 9 . Finally, let x\,... ,x m G F g , 
then L(x 1 ,..., x m ) is a random linear combination of x\,..., x m where the random coefficients are uniformly chosen from 

F,- 

The rest of the paper is organized as follows. In Section HU we describe the network model and different classes of 
networks. In Section [III] we review the main results of the paper, present some examples, and discuss their implications. The 
next two sections, i.e. Sections [IV] and [V] present the details of the coding strategies proposed for flexible and linear networks, 
respectively. Finally, we conclude the paper in Section [Vi] 

II. Model and Assumptions 

Consider L servers connected to K users through a network. By network we mean a Directed Acyclic Graph (DAG) 
Q = ( V , E), in which the set of vertices V consists of internal nodes, and every edge e € E on the graph represents an 
error-free and delay-free link with capacity of one symbol per channel use. Each server and each user is connected to the 
network by a single link with capacity of one symbol per channel use. At each channel use each node inside the network sends 
symbols on its output links based on (deterministic/random) functions of the symbols on its input links, without introducing 








any delay, where functions corresponding to different output ports need not be the same. Also, we assume that there is no 
inter-link interference. Data is represented by m-bit symbols which are members of a finite field F r/ , where q = 2 m . 

Consider a library of N files {Wi,..., Wjv} each of F bits is available to all servers. Each user is also assumed to have 
a cache of size MF bits. During its operation, the network experiences two different traffic conditions, namely low-peak and 
higli-peak leading to different network transmission costs for the two conditions. Based on the given traffic condition, the 
network operates in two distinctive phases. The first phase that is performed during low-peak condition is called the cache 
content placement phase at which servers send data to the users without knowing the actual requests of the users. This data 
is cached at the users with the size constraint of MF bits and is stored to be used in the future. In the second phase that is 
performed during high-peak network condition, each user requests one of the files (demand dk of user k denotes requesting 
file Wd k ), and according to these requests the servers send proper packets over the network. Subsequently, upon receipt of 
packets over the network, users try to decode their requested files with the help of their own cache contents. Assuming that 
the cache placement transmission delay during the low-peak condition puts no constraint on overall network performance, the 
goal is to design the cache placement strategy such that the service delay at the time of content delivery is minimized. 

Channel uses in the network are indexed by time slots t = 1,2,.... At time slot t, servers transmit symbols si(t),..., sl(£) 
and users receive symbols n(f),... ,rx(t) without delay. We consider the most general case, i.e., 

rk{t) = fk(si(t),s L (t)), k = l,...,K, 

in which we have assumed that the network is memory-less across time slots. Functions /&(.) depend on the topology of the 
network and the local operations of the nodes inside the network. 

We define: 

j 

s(t) = 

V 

where s £ F^ xl , and r £ F^ xl . 

In the first phase, users store data from the servers without knowing the actual requests. The only concern in the first phase 
is respecting the memory constraint of each user. However, in the second phase, we focus on the time needed to deliver the 
requested files to the users. The second phase consists of Tq time slots (channel uses). In other words, during the second phase, 
servers sequentially transmit s(l), s(2),..., s (Tq), and the users receive r(l), r(2),..., r (Tc). Consequently, Tc{d\, ..., dx) 
is the number of times slots required to satisfy demands d\,... ,dk- Then, we define the optimum Coding Delay as: 

D* = min max Tc(d\,... ,dx), (1) 

di,...,dk 

where the minimization is over all strategies. In this paper, we are interested in characterizing D* for a network, given its 
specific topology. 

For a given network topology, the network input-output relation depends on operational design of internal nodes. As we 
will show, the richer the network topology is, the broader the design space will be. Therefore, we consider the following three 
classes of network topologies: 

« Dedicated Networks 

In this class of networks, each packet transmitted by a server is routed to a fixed subset of the users. In other words, 
we can dedicate each server to a fixed subset of users, and this server can send packets to these users, concurrently and 
without interference to other servers. We assume these subsets to be non-overlapping so that each user is assigned to 
a single server. Also, we assume we can balance these assignments such that the number of users assigned to a server 
is almost the same for all servers. If network topology allows us to perform such assignments, we call the network a 
Dedicated Network. 

More formally, there exists a coding (in this case just routing would suffice) strategy at the network nodes such that there 
exists a partitioning {Pi,..., P^} of [K'\ = {1,2,..., K'} where 

K' 

\P l \ = —,l = l,...,L 

Wk = l,...,K, if k £ Pi, then fk(si, ■ ■ ■ ,s L ) = s t , (2) 

in which K' is the smallest number larger than or equal to K which is divisible by L. 

Consider Fig. |2]-(a) in which L = 2 servers are connected to I\ = 4 users via a dedicated network. In this example, we 
have K' = K , and it is easy to verify that we can find a routing strategy at intermediate nodes such that: 

Pi = {1,2},P 2 = {3,4} 
fi(si,s 2 ) = h(s\,s 2 ) = si, 
f 3 (si,s 2 ) = /4(si,S 2 ) = s 2 . 
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Fig. 2. Examples for dedicated, flexible, and linear networks. 


• Flexible Networks 

In this class of networks, we assume that there exists a coding (routing) strategy at network nodes such that for every 
partitioning {Pi,..., Pl} of [K] = {1, 2,..., K} we have: 

Vfc = 1,..., AT, if k G P h then f k (s-L,...,s L ) = si. (3) 

It should be noted that in the dedicated networks, each server was assigned to a fixed subset of users, while in flexible 
networks we can flexibly change these assignments during the data delivery phase. In the example shown in Fig. |2]-(b), 
we have chosen two sample partitionings, i.e. Pi = {1,4},P2 = {2,3} for the top figure, and Pi = {2,4},P2 = {1,3} 
for the bottom figure. It is obvious that every flexible network is a dedicated network, but the converse is not true. Hence, 
flexible networks are generally richer than dedicated networks in terms of their internal connectivity. 

« Linear Networks 

In the aforementioned dedicated and flexible networks, the intermediate nodes should know the topology of the network 
in order to do a proper routing of their input data onto their output ports. However, in the case of linear networks, we 
assume that such knowledge is not available at intermediate nodes. Thus, we assume that each node generates a random 
linear combination of data at its input ports to be transmitted on its output ports. Consequently, the overall transmit and 
receive vectors of the network are linearly related at each time slot: 

r (t) = Hs(t), (4) 

where H G F^ xL . H is called the Network Transfer Matrix (NTM). Let us define: 

X 4 [s(l),s(2),...,s(T c )], 

Y 4 [r(l),r(2),...,r(T c )]. (5) 

We call matrices X G ¥ l lj y T ° and Y G F^ xTc , transmit and receive blocks, respectively. Then, transmit and receive 
blocks are also linearly related: 

Y = HX. (6) 

In Linear Networks , we assume that network topology is rich-enough to guarantee that the elements of H are i.i.d. random 
variables. Similar to most existing papers employing random linear network coding, we assume large-enough q = 2 m to 
assure that NTM exhibits full rank matrix properties, with high probability (9), flOl . Also, we assume uniform distribution 
on the elements of H, which is a proper assumption for large scale networks with many sources of randomness [U, j—[14j. 






















































Fig. 3. The Super-Server Strategy. 


Finally, for later reference, define h/. as 

hfc = [hk, 1 , ■ ■ ■, hk.L]*, k = l,...,K. (7) 

It should be noted that we assume a static network transfer matrix H, such that it does not change for the duration of Tc 
time slots. As changes in the network transfer matrix is due to topology changes (e.g. failure of a node), such assumption 
is valid in most practical scenarios. Fig. [2]-(c) illustrates an example of a linear network in the case of L = 2 and K = 4. 
Finally, it should be noted that in this paper, we assume N ^ I\. Such assumption will lead to more clear presentation 
in the rest of this paper and will also exclude the possibility of using uncoded multi-casting schemes that may trivially be 
adopted for the case of small number of files. Extending the results to the case TV < K is straightforward, and the readers are 
referred to Ill- 

Remark 1. It should be noted that if a server is connected to the network by a number of links (each of integer capacity) with 
the total capacity of t symbols per time slot, our model can accommodate this scenario by splitting this server into t separate 
servers. 

Remark 2. The random linear network coding approach at intermediate nodes is also used in other papers such as ed, m, 
and l20l . in the context of uni-casting via interference alignment. 


III. Main Results: Review and Discussion 

The simplest approach in designing a coding scheme for the multi-server case is to directly transform it to a single-server 
scenario and use the scheme presented in Q. Such approach can be simply adopted by adding a Super Server node and 
connecting it with edges of infinite capacity to all other servers (see Fig. [3]). As shown in HI. we only need to route packets 
that are transmitted by the super-server to those users that can benefit from receiving them. For tree networks, such approach 
results in the following simple topology-aware routing scheme: at each interior node, the packets received at the input port 
benefiting at least one of the descendants of the node, is sent on the corresponding output port. As proved in QD, the minimum 
traffic load imposed on each link, in the scaling sense, can be achieved by such simple routing scheme. Such approach also 
leads to an order-optimal coding delay for tree networks under our formulation. 

One can, however, think of another naive and simple approach to the multi-server problem. We can simply dedicate each 
server to a subset of users and make it responsible for satisfying the requests of the corresponding subset of users. It is clear 
that, in order to prevent congestion at a specific server, we should balance out loads of the servers so that each of L servers 
will be responsible for about K/L users. Consequently, one can easily arrive at the following theorem for the coding delay in 
dedicated networks: 


Theorem 1. The coding delay for a dedicated network is upper bounded by a piecewise-linear curve with comer points 

A"(l-f) F 


D*(M) < 


min (A'', L + K'j$-)m 


( 8 ) 


where K L ^ £ N should be satisfied, and K' is the smallest number larger than or equal to K which is divisible by L. 














The proof of Theorem Q] is straightforward, and thus, we just draw the main sketch here. First, let us review the main concept 
behind the coded caching scheme for a single server in a broadcast scenario 0. In this case, if we do not have any cache 
at the users, it is clear that the server should in sequence send all the requested files to the users (considering that the users 
request different files). This will lead to a total amount of KF bits to be transmitted. Since the server is only able to transmit 
to bits (a symbol in F 9 ) at each time slot, the coding delay will be K P time slots. By providing cache at the users, the local 
caching gain will reduce the coding delay to K{ 1 — -44- The main result in Q indicates that by exploiting the additional 
global caching gain , the coding delay for KM/N G N reduces to: 


K(1 - M/N) F 
1 + KM/N to’ 


(9) 


which is order optimal for this scenario. 

As we extend to the multi-server case, let us assume for simplicity that K is divisible by L. Splitting the original L-server 
problem with K users into L single-server problems with j- users is possible in this case. Since the sub-networks may operate 
in parallel, the delay is further reduced to: 


T C 


1 + fM m 
K( 1 - M/N) F 
L + KM/N to’ 


where KM/LN G N. Since in any scheme we can benefit at most all the K users simultaneously, the total multi-casting gain 
of any scheme is at most K, and the denominator should be compared to K (by the min operator in the denominator of ©I. 
Extension to the case where K is not divisible by L can be accomplished by adding virtual users. The following example 
compares the above two naive approaches: 


Example 1. Consider the network shown in Fig. 0] for K = 4 users. We also assume the library contains N = 4 files, and each 
user can store M = 2 files during the cache content placement phase. By adding a super server a tree network is obtained, 
and in the delivery phase, the scheme in (8) suggests to send 


Ri 


K(l-M/N) 
1 + KM/N 

ifizl)*. 

1+ 4X2 
2 „ 


bits at the super server’s output. In their scheme, at each node only those packets benefiting the descendants of an output port 
will be copied on that port. However, in our case each packet benefits 1 + = 3 users, and thus should be copied on both 

output ports of node n\. This results in: 


R-2 — Ri, 


and since we assumed a capacity of one symbol per time slot for each internal edge, the delay of this scheme is: 

T c = — = \~. (10) 

TO 3 TO 

At this stage, the key question is whether it is possible to further reduce the required number of time slots or not? In fact, 
with a closer look at this network it becomes evident that we can reduce this network to a dedicated network with: 


Pi = {1,2} 

P 2 = {3,4}. 


Therefore, the original problem can be divided into two sub-problems (see Fig. [4]) and each server can address the load of its 
corresponding sub-network by: 


r 3 


i i KM r 

L ' L N 

F 

~2 


Since the sub-networks operate in parallel, the delay of this scheme will be 


R3 _ 

m 2 to 


( 11 ) 








N = 4 




Fig. 4. Example [T| 


time slots. 

The above example shows that although the scheme in ( 8 ] is order-optimal for tree networks, however, by designing a 
topology-aware scheme it may be possible to arrive at a better pre-constant factor. 

Next, let us consider another class of networks with more flexibility, i.e. Flexible Networks. In such networks, similar to 
dedicated networks, we can assign a subset of users to each server, and the network allows parallel operation of the servers. 
However, unlike dedicated networks, such assignment can be changed arbitrarily in subsequent transmissions. Such extra 
freedom in user assignments allows a significant reduction in the coding delay as shown in the following example. 

Example 2 (L = 2, K = 4, N = 4, M = 1). For a single server case, the scheme proposed in f7] achieves the following delay 
for M = 1: 

_ K( 1 - M/N) F _ 3 F 
C ~~ 1 + KM/N m ~ 2m' 

In order to get a better insight on this result, consider Fig. 0(a) which shows the cache content placement and the delivery 
scheme for requests A, B, C, D by users 1, 2,3,4, respectively. In the cache content placement phase, each file is divided into 
four equal-sized parts and cached as shown in Fig. 0(a). In the delivery phase, the single server sends the following data in 
sequence: 

A 2 + B 1 , As + Ci, A 4 + D \, Bs + C 2 , B 4 + D 2 , C 4 -f- Ds ■ 

As a result, six transmissions are required while each has the delay Thus, the total delay will be Tc = f — = |^. In 
the above scheme, each transmission benefits a pair of users, and is of no value for the other pair. 

If we have two servers, by the definition of flexible networks each server is able to transmit a given data to a pair of users 
simultaneously and interference-free from transmission of the other server. In Fig. 0(b), transmissions of the left and right 
servers are colored as blue and red, respectively. Thus, a pair of transmissions in Fig. 0(a) can be sent simultaneously as 
shown in Fig. 0(b), resulting in the achievable pair (M,Tc) = (1, | —). Thus, exploiting the extra flexibility of the network 
in this example results in the coding delay enhancement, compared with the single-server case. 

In dedicated networks, we exploit the network topology to assign a fixed number of users to each server. In this way, a user 
receives packets only from a certain server and this assignment is fixed during the course of transmission. In flexible networks, 
however, at different time slots users can be served by different servers where the assignment strategy is fixed for each server. 
Fig. 0 shows two servers connected to three users through such flexible network. The blue packets originating from server 1 
are intended for one user (which may change at different time slots) and the red packets originating from server 2 are intended 
for two users (which may change at different time slots). We assign blue packets to be associated with Strategy 1 and red 
packets with Strategy 2. Fig. 0 shows consequent transmissions in such network where Strategy 1 is associated with server 1 
and Strategy 2 with server 2. In general, we associate Strategy p to a packet if it is intended for p users. Now, if we fix a 
strategy for a server, it means that all the packets transmitted by that server have the same strategy. It is worth mentioning 
that packets received by a user do not necessarily have the same strategy, since they may have arrived from different servers 
(see Fig. 0. 

















Fig. 5. Flexible Network Example [2] 



time 

Fig. 6. Server 1 and blue packets are associated with Strategy 1, and Server 2 and red packets are associated with Strategy 2. 


Consider server i with Strategy p,. Also, we assign a fraction F, bits of each file to be delivered by Server i. In order to 
employ the scheme in J7) for this server, we allocate a memory of size M, bits from all the users to be used only by Server 
i where 

N 

Mi = — (pi - 1 )Fi. 

K 

Therefore, Server i can deliver F, bits to all the users in Tq{i) time slots where 


Tcli) 


K (l - HUP) 

, , KMi/F, 

1 + N 

K - pi + 1 Fi 


F\ 

m 


Pi 


m 


( 12 ) 






























































We assume that a routing strategy exists where packets from different servers do not interfere with each other. In this case, 
the total delay is limited by the maximum delay of the servers. Therefore, in order to balance out the servers’ loads, we can 
simply set: 


Fi = a 


Pi 


K-pi + 1 


where a does not depend on i and satisfies: 






Pi 

K-pi + l 


F = 


F. 


Therefore, 


Since the total memory is M, we have 


a = 


v£ 


Pi 

K — pi + 1 


(13) 


Hence, 


M = Y J M l /F = ^£>-1 )Fi 


i= 1 


KF 


i =1 


jEfex-i)' 
2 — 1 
\~~\L 

TV Et=i 


Pi 


K-pi + 1 


K-pi + l 


K Et 


pi 


1 K-pi+l 


T c = a— 


m 


_ i 

i= 1 K — 


*=1 K—pi + 1 


(14) 


(15) 


The aforementioned result is based on a strong assumption that a routing strategy exists for parallel and interference-free 
transmission of the packets. In Section HVl we show that such a strategy does in fact exist for flexible networks. The preceding 
discussion is a rough proof of the following Theorem: 


Theorem 2. Suppose a flexible netw’ork with L servers. Then, for all Q £ {0,... ,K — L} the following (M, Tc) pairs (and 
the straight lines connecting them) are achievable 


C M,T c ) = 


' Pi (.Pi- 1) 

iv Z^l K—v 


K-pi +1 


1 


F 


K 


E l pi 
1 K-Vi- 


for all pi + ... + pl = K — Q, where pi ^ 2 


(16) 


_ p± 

1 K-pi + l ^1 K-pi + l 
and thus lead to an upper bound for the optimum coding delay D*. 

Proof: See Section [TV] for the proof. ■ 

In the following example, we present a network in which employing the flexible network strategy results will go beyond 
earlier results and paves the way for scaling improvement in the coding delay compared with the super-server strategy. 


Example 3. Consider the network depicted in Fig. |7}(a). In this network, L (an even number) servers are connected to 
I\ = I?1 2 users via L intermediate nodes where each intermediate node has dedicated links to all the users. We also assume: 

M _ 2 (L \ 

TV “ZHI 7' 

In order to use the super-server strategy with the tree approach proposed in (8), we need to choose an appropriate tree inside 
the network. It can be easily verified that the tree illustrated in Fig. |7}(b) is the best choice. Therefore, I f , the minimum rate 
of the super-server, is given by 

K( 1 - M/TV) 

1 1 + KM/N 

E(i _ M/N) 

= -1- * 

2 

= L( 1 - M/N)F. 


















The load If on each server consists of those packets that are useful for at least a user which is a descendant of that server. 
We know that each packet benefits a subset of users of size: 


1 + 


KM 

N 


L 

2 ' 


Therefore, the ratio of packets routed on a specific edge to the total number of packets is: 


Jh 

Ri 
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s-^L/ 2 (L/2\(L 2 /2-L/2\ 
2-^i—l V i ) V L/2—i ) 




L/2 \ ( _ L/2 \ 

L 2 /2)\ L 2 /2~l) 

L 



1-e" 1 / 2 , 


L/2 \ 

L 2 /2 — {L/2 — 1) / 


for large L. Thus, almost a constant number of packets generated by the server will be routed on each edge. This will result 
in a delay of: 


T c = 


R 2 

m 


V J N m 


(17) 


time slots. 

A closer look at the network topology shows that the network is indeed flexible. Setting pi = L/2 which satisfies ffPi = K 
and using memory size M where 


M = 


£i!£ 

JV z^l K — 


L Pi (Pi~l) 
K—pi+1 


K 

N 

K 


_ 1 

1 K — 


Pi 


K—pi+1 


- 1 


N (L 


L 2 /2 \ 2 

Theorem [2] can be used to achieve the following coding delay: 

1 


- 1 


( 18 ) 


T C = 


F 


E l _ p_ 

1 K- 


K-pi + 1 

F/m 


m 


= 1 - 


L/2 

J L‘ z /2—L/2+\ 

F 
m 


M 

iV 


(19) 


The above delay in ([T9l > is not only a scaling improvement compared with the super-server tree-based strategy with delay 
O, but also the optimal delay. This is due to the fact that each user can store at most j/-F bits of each file. 

The optimality of the preceding coding scheme can be generalized to any flexible network where K is divisible by L as the 
following theorem states. 

Theorem 3. If K is divisible by L, then the upper bound in Theorem [2] is optimal within a multiplicative constant gap. 


Proof: See Section [TV] for the proof. ■ 

For flexible and topologically complex networks, finding a proper routing strategy that achieves the optimal coding delay 
may not be straightforward. To overcome this difficulty, internal nodes can perform simple random linear network coding which 
is oblivious to the network’s topology. Although this strategy may not be optimal, it has the advantage of being practical and 
robust. In this way, the network model reduces to a linear network model and the following theorem provides an achievable 
coding delay for such networks. 
















Flexible Network 


Tree Network 


Fig. 7. Example [3] 



Fig. 8. Example |3 N = 4, K = 4. 


Theorem 4. The coding delay for a linear network with L servers is upper bounded by a piecewise-linear curve with the 
corner points 

K( 1 - M/N) F 


D*(M) < 


min (if, L + KM/N ) m 1 


( 20 ) 


where KM/N E N should be satisfied. 


Proof: See Section [V] for the proof. ■ 

In linear networks, a packet intended for a certain number of users, in general, interferes with all other users. Proper pre¬ 
coding schemes can be adopted to reduce interference in such networks. Consequently, simultaneous transmission of multiple 
packets will further reduce network coding delay. In order to clarify the implications of Theorem [4] we present the following 
example: 


Example 4 ( K = 4 ,N = 4). Consider a network with K = N = A. Using Theorem[4] the coding delay for any L £ {1, 2,3,4} 
is given by 

4 -M F 

rn __ 

C min(4, L + M) m 

The above delay is plotted in Fig. [8] for L £ {1, 2,3,4}. The problem for L = 1 reduces to that of Qj. For L = 4, we obtain 































a multiplexing gain of 4 by constructing four parallel interference-free links each from one server to one user (e.g. through 
Singular Value Decomposition) and the optimal coding delay is achieved. Networks with L £ {2,3} are interesting cases 
where interference management is required to achieve the gain min(4, L + M ) in the denominator. The detail of the coding 
strategy is rather involved and we delegate it to Appendices B and C. 


IV. Flexible Networks: Details 

In this section, we present an achievable scheme for the flexible networks leading to the result given in Theorem [2] We also 
provide a proof for the optimality result in Theorem [3] through cut-set analysis. 

For the achievability part, we need to provide the cache content placement and content delivery strategies. Let us start with 
defining the following parameters: let Q £ {0,..., K — L} and consider an integer solution of the following equation: 

Pl + . . . + p L + p L +1 = K, 


where pl+i — Q and pi ^ 2, i = 1,..., L. We also define 


OLi 


li 


X 


Xi 



(K — Pi)\pi\ . 

—j- 7,1 

Pl\ .. ,p L + 1! 


...,L 

= 1,...,L + 1 


L 

t 

f7 iX i = 

10 i = L+1 


( 21 ) 


Cache Placement Strategy: First, split each file W n into L sub-files 

W n =(Wy.i = l,...,L), 

where W* is of size otiXiF. Then, split each sub-file W l n into on equal-sized mini-files: 

Wi = (Wi iT( :T i C[K} 1 \n\=Pi-l). 

Finally, split each mini-file W„ T . into 7 ,; equal-sized pico-files of size xF bits: 

W^ Ti = (W^ Ti :j = 

where 7 , is an integer number. For each user k, we cache pico-file W^ J T if k £ r,, for all possible 1 . j. n. Then, the required 
memory size for each user is: 


M 


1 


- /L 


F 

N 


K - 1 
,i=i ^ ~ 2 

Eh 

ZL ( P *ih 

NZl i#*=3: 


7 ixF 


Z^i=l 


Pi 


( 22 ) 


K—Pi +1 


which is consistent with the assumptions of Theorem [2] 


Content Delivery Strategy: Define Pf,..., P/k\ to be the collection of all p ( -subsets of [I\] for all i = 1,... L + 1. The 

\Pi) 

delivery phase consists of . — — T transmit slots. Each transmit slot is in one-to-one correspondence with one (p 1; ... ,Pl+i)~ 
partition of [A']. Consider the transmit slot associated with the partition 


{a 1 ..A7I} 


where 


9i £ 11,..., |. Then, the server i sends 


+rgp; W. 


i,N{P' g .) 
e 4 ” dr,Pg.\{r} 










Fig. 9. Flexible network file distribution for proof of Theorem [2] 


to the subset of users Pg ., interference-free from other servers, where the sum is in F g and is over all r € Pg.. Since we 
have assumed a flexible network, simultaneous transmissions by all servers is feasible. Also, the index N(Pg.) is chosen such 
that each new transmission consists of a fresh (not transmitted earlier) pico-file. Obviously, the virtual server L + 1 does not 
transmit any packet. 

Since each pico-file consists of symbols, at each transmission slot we should send a block of size L-by-x^ by the 
servers. Also, since this action should be performed for all Pl i K J L+1 \ s l°ts, the delay of this scheme will be: 


T c 


K\ F 

— - - x x— 

PV----PL+V- m, 

1 F 


E l pi 

1 K—pi + l 


m ’ 


(23) 


as stated in Theorem [2] Consequently, if we show that through the aforementioned number of transmit slots all users will be 
able to recover their requested files, the proof is complete. 

Correctness Proof: The main theme of this scheme is to divide each file into L sub-files, and to assign each sub-file to a 
single server. Then, each server’s task is to deliver the assigned sub-files to the desired users (see Fig. 0 . 

Consider server i. This server handles sub-files W' n) n £ [TV] though the following delivery tasks: 


wi 

server 1 

User 1 

wL 

server i 

User 2 

w l dK 

server i 

User K 


The above formulation leads to a single server problem Q with files of size i 7 ) = ajX,F bits. It can be easily verified that the 
proposed cache placement strategy for each sub-file mimics that of m for single-server problems. Therefore, if we demonstrate 
that this server is able to send a common message of size Xi— t symbols to all p,-subsets of users, then this server can handle 
this single-server problem successfully. However, in the above scheduling scheme, the server benefits each p,-subset of the 
users by a common message of size x— symbols (a pico-file size), 7 , times. Consequently, the total volume of common 
message that this server is able to send to each p,-subset is 7 \ • x— = Xi— symbols. 

Since by proper scheduling scheme in flexible networks all servers can perform the same task simultaneously, all requested 
portions of files will be delivered. It should be noted that the portion of each file assigned to the virtual server is xl+ 1 = 0. 
Algorithm 1 presents the pseudo-code of the procedure described above. 

To prove Theorem [3] we first state the following lemma: 































Algorithm 1 Multi-Server Coded Caching - Flexible Networks 
l: procedure PLACEMENT^!,..., Wn,Pi, ■ ■ ■ ,Pl+i) 

/ IS \ _ 


4 

5 

6 

7 
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9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 


31: 


32 

33 

34 

35 

36 


<- L^i), i = l,...,L 
li «- ([K-Pi)W)/(Pi'- ■ ■ ■ ,Pl+ iD> 
x <- 1 /Ci a Hi) 


i — 1,..., L + 1 


Xi <- 7 iX, i = 1,..., L 

xl +i 4— 0 

for all n € [iV] do 

split W n into (VF* : i = 1,..., A), where |W* | = a,x % 

for all * = 1,..., L do 

split FF* into (W^ T . : r, C [K], |r,;| = p i — T) of equal size 
for all r» C [/\], |rj| = p* — 1 do 

split W n,n int o (W*f r< : i = 1, ■ ,7i) °f e q ual size 

end for 
end for 
end for 

for all k G [A] do 

for all i = 1,..., L do 

z k «- (W *£ 4 : r» C [K], |T 7 1 = Pi — 1, k £ Ti, j = 1,..., 7 ;, n G [iV]) 

end for 
end for 
end procedure 

procedure DELIVERY(FFi, ..., Wn, di,... ,dx, Pi,--- ,Pl+ i ) 
for all i = 1,..., L do 

for all j = 1 ,..., (*) do 

N(Pj) <— 1 

end for 
end for 

for all partitions of [ K] with sizes pi,....P l+i, (Pi ^ 2, i = 1,..., L) do 
{Pq x , • • •, selected partition 


transmit X({P e \,..., Pg+ + \}) = 


for all i = 1,..., L do 
N(Pl) t— N(P^) + 1 

end for 
end for 
end procedure 


1,N(P^) 

h -eP e \ W dr,Pi x \{r} 

L,N(P e L L ) 

_ +rePf L W dr , P L\ {r} 


P} 


•Pi 


Lemma 1. The coding delay for a general network with L servers is lower bounded by 


1 


D*(M) ^ max n r „ 

se{i,...,K} mm( 5 , L) \ 


5 F 
s -^n M 


(24) 


Proof: See Appendix A for the proof. ■ 

The above lemma can be used to prove optimality of the proposed scheme in some range of parameters. The following 
corollary states the result. 

Corollary 1. All {M — Tc) pairs in Theorem\2\corresponding to Q = 0 are optimal. Thus, the converse line (l — 21) 2) is 
achieved for M* ^ M ^ N, where 


M* = 


, r X'L Pi (Pi-1) 
A Z vl K-v 


iJLJLXiJL —— j 

pi+...+pl=k K y\ 


K—pi+l 
Pi 


(25) 


'1 K-pi + l 

Proof: Theorem |2] states that all the (M — Tc) pairs in i | 1 61 are achievable. By some simple calculations one can show 














that for these achievable pairs we have: 


Therefore, if we put Q = 0 in Theorem [2] all the corresponding (M — I'c) pairs satisfy 

On the other hand, by considering the case of s = 1 in Lemma Q] we know that the optimal coding delay satisfies: 

M\ F 

Ivy to’ 

which is matched to our achievable coding delay . Therefore, by setting Q = 0 in Theorem [2] for all 


( 26 ) 


D*{M) 1 - 


M = 


N T. fife# 

K T[ 


Pi + ...,p L = K,pi ^ 2, 


>1 K—pi+l 

the achievable coding delay is optimum. By minimizing the cache size, over all partitionings satisfying pi + ... ,pl = K, p, ^ 2, 
the proof is complete. ■ 

There is an interesting intuition behind Eq. (l26l >. In the proposed scheme for flexible networks, we assigned a subset of Q 
users to the virtual server, and all the other K — Q users benefited from other servers. Thus, through each transmission, the 
ratio of users will be real users. This is exactly the coefficient that shows how close is the achieved delay to the optimal 
curve (1 — M/N)F/m. 

Finally, we are ready to prove Theorem [3 We consider two regimes for cache sizes. First , we let 

In the first regime where M Js M* , using Theorem [2] with Q = 0 and p \,. pr^ = j-, we obtain: 

( M\F 
G ~ ( ~ Tv) m' 

As Corollary Q] states, for this case the optimal curve is achieved. 

For the second regime where M < M* (such that KM/N £ N), set 


Q 


= K — ( 


KM 


+ 1 )L 


Pi,---,Pl = 


V N 

K-Q _ /KM 


L 


+ 1 


Then, we obtain: 


T _1 K(1- M/N) 
C L 1 + KM/N ‘ 


On the other hand, from Lemma |T| we have: 


D* > 


max 


sefi,...,^} min(s, L ) 


- 4 y F 


ifj 




^ - 


1 / s F 

max — | s — -rrrrM ] — 

m 


l y LtJ 
1 1 K(l-M/N) 

L 12 l + KM/N ’ 


> 12 T c , 

where the last inequality follows from Q. This concludes the proof of Theorem [3] 


( 27 ) 















Fig. 10. Example [5 ] (L = 2, K = 3, N = 3): Lower and upper bounds on the coding delay. 


V. Linear Networks: Details 


In order to explain the main concepts behind the coding strategy proposed for linear networks, we will first present a simple 
example: 


Example 5 (L = 2, K = 3, N = 3). In this example, we consider a network consisting of L = 2 servers, K = 3 users, and a 
library of N = 3 files, namely W\ — A, W 2 = B, and W 3 = C. By definition of linear networks the input-output relation of 
this network is characterized by a 3-by-2 random matrix H. Lower and upper bounds for the coding delay of this setting are 
shown in Fig. [To] The lower bound is due to Lemma |T| as follows: 


D* ^ max 


( i M 3-3 M\ 

V ) 


F 

m 


(28) 


The upper bound is due to Theorem 0] except the achievable pair ( M,Tq ) = (|, 1), which will be discussed later. We have 
also exploited the fact that the straight line connecting every two achievable points on the M — Tc curve is also achievable, as 
shown in 0 - In order to get a glimpse of the ideas of the coding strategy behind Theorem [4] next we discuss the achievable 
(M,Tc) pair ( 1 , |). In this case, as we will show, we can benefit both from the local/global caching gain (provided by cache 
of each user), and the multiplexing gain (provided by multiple servers in the network). The question is how to design an 
scheme so that we can exploit both gains simultaneously. In what follows we provide the solution: 

Suppose that (without loss of generality) in the second phase, the first, second, and third users request files A, B, and C 
respectively. Assume that the cache content placement is similar to that of JTJ: First, divide each file into three equal-sized 
non-overlapping sub-files: 


A 

B 

C 

Then, put the following contents in the cache of users: 

Z! 

Z 2 

Z3 


[Ai,A 2 , A 3 ] 
[Bi,B 2l B 3 \ 
[Ci,C 2 ,C 3 ]. 


[Ai, B\, Ci] 
[A 2 , B 2 , C 2 ] 
[A 3 , B 3 , C 3 ]. 


Let L(x 1 ,..., x m ) be a random linear combination of x±,. 
servers send the following transmit block: 

X = [hiL\(C 2l B 3 ) + h 2 i Li(A 3 ,C 1 ) + hiLl(A 2 , 


.., x m as defined earlier. Consequently, in this strategy, 

Si), h iL\{C 2l B 3 ) + h^(A 3 , Ci) + h iL 2 3 (A 2 , B,)}. 


the two 

(29) 


where the random linear combination operator L(-, •) operates on sub-files, in an element-wise manner, and h ( is an orthogonal 
vector to h, (i.e. h, Ti , 1 = 0). Let us focus on the first user who will receive: 

hr.X = [(h£.h i)L\{A 3 ,C{) + (h£.h i)L\{A 2 ,B{), (h£.h 1 )L!(A 3 ,C' 1 ) + (h^-.h 1 )L§(A 2 ,B 1 )] 

= [L 1 (A 2 , A 3 , Ci, Bi) 1 L 2 (A 2 , A 3i Bi, Ci)]. 


( 30 ) 







As the first user has already cached Bi and C\ in the first phase, by subtracting the effect of interference terms, the first user 
can recover: 


[L(A 2 , A 3 ), -i/(A 2 , A 3 )], 

which consists of two independent (with high probability for large field size q ) linear combinations of A 2 and A 3 . By solving 
these independent linear equations, such user can decode A 2 and A 3 , and with the help of A\ cached at the first phase, he can 
recover the whole requested file A. It can easily be verified that other users can also decode their requested files in a similar 
fashion. The transmit block size indicated in (l29l) is 2-by-|^, resulting in I'c = time slots. 

Let us forget about one of the servers for a moment and assume we have just one server. Then, the scheme proposed 
in 0 only benefits two users per transmission through pure global caching gain. Also, in the case of two servers and no 
cache memory (the aforementioned case of M = 0), we could design an scheme which benefited only two users through 
pure multiplexing gain. However, through the proposed strategy, we have designed an scheme which exploited both the global 
caching and multiplexing gains such that all the three users could take advantage from each transmission. 

Finally, let us discuss the achievable pair ( M,Tc) = (|, 1), where we need to adopt a different strategy. Assume we divide 
each of files A, B and C into three equal parts and fill the caches as follows: 


Z\ — [Ai + Bi + Ci] 
Z 2 = [A2 + B 2 + C 2 ] 

Z3 = [A3 + B3 + C3]. 


Consequently, the servers transmit the following vectors: 


X 2 = 


X 3 = 


r. J- 

n 3 

hi.hg 

h]j- 

In. li :1 

K 

I13. ho 


■Bi + 


■A 2 + 


■A 3 


h<^ 

hi. 

K 

h 2 .ii] 1 - 

i± 

h3.l1, 1 


Ci 


c 2 


B3 


(31) 


It can be easily verified that the first user receives A 2 , A 3 , and Bi + C\. So, with the help of its cache content, it can decode 
the whole file A. Similarly, the other users can decode their requested files. As each block X,; is a 2-by--^ matrix of symbols, 
the total delay required to fulfill the users’ demands is Tc = — time slots. 


Example 0] also, discusses the coding delay for a linear network with four users. The details of the coding strategy of 
Example 0] which are provided at Appendices B and C, further clarify the basic ideas behind the proposed scheme. However, 
in the rest of this section, we provide the formal proof of Theorem 0] 


Cache Placement Strategy: The cache content placement phase is identical to J7]: Define t = MK/N, and divide each 
file into ( K t ) non-overlapping sub-files af} 


Wn = (W„, r : T c [K\, |r| = t), n = 1,..., N, 

where each sub-file consists of F/(*^ bits. In the first phase, we store the sub-file W rltT in the cache of user k if k £ r. 
Therefore, the total amount of cache each user needs for this placement is: 



K — 1 
t - 1 


= MF 


bits. 

We further divide each sub-file into 


non-overlapping equal-sized mini-files as follows: 


W n ,r = [W^ T : j = 1 , . . 

Thus, each mini-file consists of F/ ^(*)( A £^7 1 )) ^ ts - 


K-t- 

L-l 



Content Delivery Strategy: Consider an arbitrary (t + /.(-subset of users denoted by S (i.e. S C [K\, ,Sj = / + L). For this 


should be noted that the definition of sub-files and mini-files here differs from that of flexible networks. 








specific subset S denote all (t + l)-subsets of S by Ti,i = 1,..., (*^) (i.e. Xi C S , \Tf\ = t + 1). First, we assign a L-by-1 

for all j £ S\Ti 


vector Ug to each T, such that 


x s 

.Ti 


U c 


_L ho 


/ h j for all j £ Ti. 


(32) 


The following lemma specifies the required field size such that the aforementioned condition is met with high probability: 

Lemma 2. If the elements of the network transfer matrix H are uniformly and independently chosen from F g , then we can 
find vectors which satisfy ( 1321) with high probability if: 


q » (t + 1) 


K 


t + LJ \t + 1 


t + L 


(33) 


Proof: First, since the set S\T has /. 1 elements, we require up to be orthogonal to I. 1 arbitrary vectors, which is 

feasible in an L dimensional space of any field size. 

Second, the total number of non-orthogonality constraints in (l32l > for all possible subsets S' is (t + 1 ){ t +L) iVyt) ' 
other hand, it can be easily verified that the probability that two uniformly chosen random vectors in F g are orthogonal is 
l/q. Thus, by using the union bound, the probability that at least one non-orthogonality constraint in (l32l > is violated is upper 
bounded by 


(t+D oo 


< 1 , 


which concludes the proof. 
For each T, define: 


G{T t ) = L reT , (wl 


Ti\{r} I > 


(34) 


where W d T \i r j is a mini-file which is available in the cache of all users in 7), except r, and is required by user r. Also 
LrfzTi represents a random linear combination of the corresponding mini-files for all r £ 7’,. Note that the index j is chosen 
such that such mini-files have not been observed in the previous (t + L)- subsets. Thus, if we define N(r,T\{r}) as the index 
of the next fresh mini-file required by user r, which is present in the cache of users T\{r}, then we can rewrite: 


- L reTi [W dr T . x{r} 


)• 


(35) 


Subsequently, we make the following definition for such (t + L)-subset S: 

X(S) = E u sG(T). (36) 

TCS,\T\=t+l 

We repeat the above procedure (* +, f ') times for the given (t + L)-subset S in order to derive different independent versions 
of 'K U! (S),oj = 1,..., ( t+L t 1 )- In other words, X w (S)’s only differ in the random coefficients chosen for calculating the 
linear combinations in (f35l) . which makes them independent linear combinations of the corresponding mini-files, with high 
probability. Thus, to distinguish between these different versions notationally we define: 


GUTi) = Lf &Ti ( 


<Eur} r}) ) - = E 

TCS,|T|=t+l 

Subsequently, for this (t + L)-subset S, the servers transmit the block 




(37) 


(38) 


and we update N(r,T\{r}) for those mini-hles which have appeared in the linear combinations in (l35l >. When the above 
procedure for this specific subset S is completed, we consider another (t + /.(-subset of users and do the above procedure for 
that subset, and repeat this process until all (t + /.(-subsets of \K] have been taken into account. 

Next, let us calculate the coding delay of this scheme, after which we prove the correctness of this content delivery strategy. 
For a fixed (t + L)-subset S each X W (5 I ) is a L- by- block of symbols. Thus, the transmit block for S, i.e. 


X 1 (»S'),..., X^t+i—ij($) 


, is a L-by- 


F/r. 


(t+L-1 


(t + L)-subsets of users, the whole transmit block size will be 


) block. Since this transmission should be repeated for all ( t+i ) 


L—by— 


n- 1 ) 


(t)i L-l) V + 


K \ F 


— = L- by- 

TO L 


K{ 1 - M/N) F 
MK/N to’ 











which will result in the coding delay of 


K{ 1 - M/N) F 
L + MK/N m 


time slots. Algorithm 2 shows the pseudo-code of the aforementioned procedure for linear networks. 


( 39 ) 


Correctness Proof: Suppose the user k , who is interested in acquiring the file W,i k . This file is partitioned into two parts: 

1- The part already cached in this user at the first phase and constitutes of sub-files: 

(Wd k , T ■ t C[K\,\t\ = t,k G r). (40) 

2- Those parts which should be delivered to this user through the content delivery strategy, which constitutes of sub-files: 

{W dk , T \T C[K],\T\=t,k (£t) . (41) 

Thus, since due to the following Lemma [4j the sub-files in the second category are successfully delivered to this user through 
the content delivery strategy, this user will decode the requested file. Moreover, since this user was arbitrarily chosen, all users 
will similarly decode their requested files. 

Before proving Lemma [4] we need another lemma which is proved first: 


Lemma 3. Suppose an arbitrary subset T C [A'] such that |T| = t + 1, and k £ T. Then, through the above content placement 
and delivery strategy, user k will be able to decode the sub-file Wd k T\{k}- 


Proof: Consider those transmissions which are assigned to the (t + L) -subsets which contain T. There exist 


r^r 1 ) 


of 

such subsets. Let us focus on one of them, namely S. Corresponding to S, the following transmit block is sent by the servers: 


and subsequently, user k receives: 


Let’s focus on hfc.Xi(S): 


h MS) W 


X 1 (S),...,X (t+ L-i ) (S)] , 

(42) 

hfe. [x 1 (S),...,X ( t + i-i ) (5)‘ . 

(43) 

= ^ u|Gr (T) 

TCS,|T|=t+l 


= E (hfc.Ug) G\(T) 

rcs,|T|=t+i,fcer 


E ( h fc- u s) ^€T(Wi P , T \ { r } )> 

TCS,|T|=t+l,fe£T 

(44) 


where (a) follows from 


(b) follows from the fact that 

Uc -L hi- 


(45) 


ig _L_ nfe for all k € S\T, 

and (c) is due to (l34l >. In (l44l> . user k can extract W dk T\{k} f rom the linear combination Lj. eT (W d -r\{r})’ s i nce a h the 
other interference terms are present at his cache. Thus, by removing interference terms, user k can carve the following linear 
combination from (l44l >: 


which is a random linear combination of ( 


r i 

^TCS,|T|=i+l,fceT 
t-\-L—1 \ 


,) 


mini-files desired by user k. However, since in ((43} user k receives ( t+ ^~ 1 ) 


independent random linear combinations of these mini-files, he can recover the whole set of mini-files: 

(wi htn{k} :TCS,\T\=t + l,k€T). 

Thus, for the T specified in this lemma, he can recover the mini-file r\{k}- Now, since there exist a total of 
(t + L)-subsets containing this specific T, by considering the transmissions corresponding to each, this user will recover 
distinct mini-files of form T\{k}- The distinctness is guaranteed by the appropriate updating of the index Nf, •). 
These mini-files will recover the sub-file Wd, k ,T\{k} an d the proof is concluded. 


Lemma 4. Through the above content delivery strategy an arbitrary user k will be able to decode all the sub-files: 

(W dktT : t C [K\, |t| = t, k t) . (46) 

Proof: Consider an arbitrary r C [A'] such that |r| = t,k r. Define T = r U {fc}. Then, since to Lemma [3j user k is 
able to decode W,i k:T . Since r was chosen arbitrarily, the proof is complete. ■ 






Algorithm 2 Multi-Server Coded Caching - Linear Networks 
l: procedure PLACEMENT(Wi, ..., W N ) 

2 : t G- MK/N 

3: for all n G [ N} do 

4: split W n into (Wn'T : r C [K], |r| = t) of equal size 

5 : for all r c [K], |t| = t do 

6 : split W„ )T into (W^ T : j = 1 ,..., of equal size 

7: end for 

8 : end for 

9: for all k G \K\ do 

10: Z k ^(Wl T :TC[K],\T\=t,k€r,j = l, t ..,( K -!_- 1 ),ne[N}) 

li: end for 

12: end procedure 

13: 

14: procedure DELIVERY(Wi,..., Wn, di,..., da) 

15: t G- MK/N 

16: for allTC [K],\T\ = t+l do 

17: for all r G T do 

18: W(r,T\{r}) G- 1 

19: end for 

20: end for 

21 : for all S C [K],\S\ = t + L do 

22 : for all T C s, \T\ = t + 1 do 

23: Design such that: for all j G S, h j J_ if j ^ T and h j JL if j G T 

24: end for 

25: for all w = 1,..., ( t+ ^ _1 ) do 

26 : for all T C s, |T| = t + 1 do 

27: GUT)^L? eT (w^ r \ r}] ) 

28: end for 

29: ^(S)^ETCS,\T^t + l^ T sGUT) 

30: end for 

31: transmit X(5) = Xi(5),... 

32: for all T C S, \T\ = t + 1 do 

33: for all r G T do 

34: N(r, T\{r}) G- N(r, T\{r}) + 1 

35: end for 

36: end for 

37: end for 

38: end procedure 


VI. Conclusions 

In this paper, we investigated coded caching in a multi-server network where servers are connected to multiple cache- 
enabled clients. Based on the topology of the network, we defined three types of networks, namely, dedicated, flexible, and 
linear networks. In dedicated and flexible networks, we assume that the internal nodes are aware of the network topology, and 
accordingly route the data. In linear networks, we assume no topology knowledge at internal nodes, and thus, internal nodes 
perform random linear network coding. We have shown that knowledge of type of network topology plays a key role in design 
of proper caching mechanisms in such networks. Our results show that all network types can benefit from both caching and 
multiplexing gains. In fact, in dedicated and linear networks the global caching and multiplexing gains appear in additive form. 
However, in flexible networks they appear in multiplicative form, leading to an order-optimal solution in terms of coding delay. 
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Fig. 11. Converse Proof. 


Appendix A: Converse Proof 


The proof is similar to the cut-set method presented in 0- See Fig. [XT] and let us concentrate on the first s users. Define 
X| to be the transmit block sent by the servers such that these users, with the help of their cache contents Z t ..... Z s , will be 
able to decode Wi ,..., W s . Also, define X 2 to be the block which enables the users to decode W s+1; ..., W 2 S , and continue 
the same process such that X^/sj is the block which enables the users to decode W^Ar/sj^s+i, ..., W s y^/ s j. Also, define R 
to be the maximum information needed to pass through the two cuts shown in the figure, by each transmit block transmission. 
Then we will have: 

N N 

4—< L—J R + sMf, 

which will result in 


However, we have: 





D*(M) > 
> 


> 


R 

min — cut 

R 

min(s, L)m 

1 

min(s, L ) 



(47) 


Now we can maximize on the free parameter s to arrive at the tightest bound, which concludes the proof. 


Appendix B: Details of Example[4](L = 2, TV = 4, K = 4) 

In this appendix, we consider the scenario in Example |4] for the case of two servers. For each memory size M = 0,... ,4, 
we present the scheme which achieves the coding delay as stated in Example [4] 


• M = 0 


In this case, we do not have any cache space available at the users. Suppose we divide each file into three equal-sized 
non-overlapping parts: 


A = [A 1 , A 2 , A 3 ] 
B = [H 1 , T? 2 , B 3 } 
C = [C' 1 ,C 2 ,C 3 ] 
D = [D l ,D 2 , D 3 }. 




















Then, the servers transmit the following blocks, in sequence: 


X({1,2}) = 
X({1,3}) = 
X({1,4}) = 
X({2,3}) = 
X({2,4}) = 
X({3,4}) = 

Let’s focus on the first user which receives: 

hi.X({l,2}) 

hi.X({l,3}) 

hi.X({l,4}) 


h fB 1 + hi A 1 
hj-C 1 + hi A 2 
h^D 1 + hi A 3 
hiC 2 + h iB 2 
hi D 2 + hi B 3 

h^I? 3 + hiC 3 . (48) 


= (hi.h^-)A 1 

= (hi-hjj - )Al 2 
= (hi.hf)A 3 . 


From the above data, this user can recover the whole file A. Similarly, other users can decode their requested files. 

The transmission stated in (l48l > consists of six blocks of size 2-by-££, resulting in a coding delay of Tq = 6^ = 2£. 
• M = 1 

Consider the cache content placement used in J7J: First divide each file into 4 equal-sized non-overlapping sub-files: 


A = 
B = 
C = 
D = 

and then, fill the caches as follows: 

£t = 
^2 = 
^3 = 

^4 = 

Such placement respects the memory constraint of M - 
bits: 


[Ai,A 2 , A 3 ,Ai 
[B \ 1 B 2 ,B 3 , B 3 ] 

[Cl,C 2 ,C3,C4\ 

[D[, D 2 , D 3 , D 4 }, 

[Ai, Bi, Ci, Di\ 

[A 2 , B 2 , C 2 , £> 2 ] 

[-43) B 3 ,C 3: D 3 ) 

[A 4 , B 4 , C 4 , D 4 ]. 

1. Also, divide each sub-file into two equal parts of size 4 £ = £ 


A t = [A ], A 2 \, 
Bi = [ B\,B % 
Ci = [Clef], 
Di = [-D\,D . 2 ], 


where i = 1,2,3,4. In the second phase, we send the following blocks of size 2-by-£ bits: 


X({1, 2, 3}) 
X({1,2,4}) 
X({1,3,4}) 
X({2,3,4}) 


[h iL\^ } {Bl,Ci) + h iL\ h3} (Al, Cl) + hiL\ ia} {A\, B\), 
hiL 2 { 2 3} {BlCl) + hiL 2 13} (Al Cl) + h iL\ 12} {AiBl)\ 
[h] L £{ 2 ,4}(-®4) B>\) + hi q{A\, D\) + hf £{ 1)2 }(4|, B\), 
^ 1 i L £{2,4}(-®4, D\) + h^£{ 14 j.(A4, D\) + hf L 2 12 y(Ai -B 2 )] 
HL\ 3A} (ClDl) + h iL\ lA} (AlDl) + h iL\ 13} (A 2 3 , Cl), 
h) L £{ 3 , 4}( C , 4) B>\) + h ^£| 14} (44, D\) + h ^Z/^ 13}( A3, Cl)] 
[h2 _ £{ 3 ,4}(C , 4) B> 3 ) + h^L| 2j4 }(f3|, Di) + hf C|), 

h^ 3 ,4}(^ D 2 ) + hiL 2 2A} (B 2 , Di) + hi L 2 m (B 2 ,C 2 )]. 


( 49 ) 


Let’s focus on the first user. From the above transmissions he recovers: 


h 1 .X({l,2,3}) 


hi.XaM^}) 


ht-X({l, 3,4}) 


[(hi-h^ - )L | 1 3 } (Ag, C|) + (hi.h i)L\ h 2 } {A\,B\), 
(h 1 -h^)L{ 13 } (^,C , 1 1 ) + (hi .h^~ )L 2 12 y {A \, B {)] 
[L 1 (A 1 ,, C{L 2 (4, Cl , A \, B \)] 

( h i- h 2 )L 2 { 1 A} (A\, D{) + (h^h i)L\ 12 } {Al,Bl)\ 
\L\A\, D{,A 2 , B 2 ), L\A\, D\,A 2 ,B 2 )\ 
[(h 1 .h3 ± )L} l!4} (^ )jD 2) + ( h i ^)L| 1) 3 } (^,C 1 2 ), 
(hi.h i)L 2 1 A} (A 2 , D\) + (h^L^A 2 , Ci 2 )] 
[L\AlD 2 ,AlC 2 ),L 2 (AlDlAlC 2 )}. 


(50) 


(Although user 1 also receives hi.X({2, 3,4}), such information is of no value to him.) With the help of its cache contents 
the first user can eliminate the undesired terms and obtain: 


[L{A\,A\),L'{A\,Al l )\ —»• 4 3 , A 2 
[L{A\,A 2 ),L'{A\,A 2 )}^A\,A 2 
[L(AlA 2 ),L'(AlA 2 )]^AlAl 

Since A{ and A\ is already available in first user’s cache location, he can subsequently recover the whole block A. 
Similarly, all other users can recover their requested files. 

The transmission scheme adopted in d49t consists of four 2-by-^ blocks which will result in the coding delay Tq = 
4-t^ = — time slots. 

4m m 

• M = 2 

Consider the cache content placement used in J7J: First divide each file into 6 equal-sized non-overlapping sub-files: 


and then, fill the caches as follows: 

Zi = 

z 2 = 

Z 3 = 
Z A = 


A = [Ai, A 2 , A 3 , A 4 , A 5 , A 6 ] 

B = [Bi, B 2 , B 3i -B 4 , f? 5 , Bq\ 

C = [Ci,C 2 ,C 3 ,C A ,C 5 ,C 6 ] 

D = [D\, £> 2 , ^ 3 , D 4 , D 5 , Dq], 


[4i , A 2 , A 3 , Bi , B 2 , f? 3 , C \, C 2 , C 3 , Di , D 2 , D 3 ] 
[Ai, A 4 , A$, f?i, S 4 , B 5 , Ci . C 4 , C 5 , Di,D^, D 5 \ 
\A 2 , A 4 . A 6 , B 2 , B 4 , B e , C 2 , Ci, Cq , D 2 , D 4 , D 6 \ 
[4-3, A 5 ,A e , B 3 , B 5 , B 6i C 3 ,C 3 , Cq, D 3 , D 5 , D e \. 


In the second phase, we send the following block of symbols of size 2-by-^: 


X=[ 


^i"^'{ 2 , 3 , 4 }(-® 6 ; C 5 , Di) + h^L| 13 j 4 |(4 6 , C 3 , D 2 ) + h 3 L ^ 12 4 y 
-^{ 2 , 3 , 4 }(-®6, C 5 ) Di) + h 2 i{i ! 3 j 4j(4-6, C 3 , D 2 ) + hg , 2 , 4 } 
hl~ ^{ 2 , 3 , 4 }(-® 6 ) Cs, Di) + -^{ 1 ) 3 i 4 }(4.6, C 3 , D 2 ) + 2 , 4 } 


{A 5 , B 3 , Di) + h 4 L{ 1j2 , 3 } 
(A 5 , B 3 , Di) + h 4 L{ 1 j 2 , 3 } 
(A 5 , B 3 , Di) + hfL| 1)2i3 j 


(A 4 , B 2 ,C\), 

(4.4, B 2 ,Ci), 
[A a ,B 2 ,Ci) ]. 


(51) 


Let’s focus on the first user who receives: 

[ L x {A4 , 45 ,46, B 2 ,B 3 , C\,C 3 ,D\, D 2 ), 

L 2 (44, 45 ,46, B 2 ,B 3 , C 1 ,C 3 ,D 1 , D 2 ), 

L 3 (44, 45 ,46, B 2 , B 3 , Ci, C 3 , D±, D 2 ) ]. 

This user also has the unwanted terms B 2l B 3 ,C\,C 3 , D 3l D- 2 in his cache, and after removing them from above linear 
combinations he has three different linear combinations of its required terms A 4 , and A(,. After solving these equations, 
and with the help of A 4 , A 2 , and ,4 3 stored in his cache, he can recover the whole file A. Similarly the other users are 
able to decode their required files. 

The transmit block stated in (|5H is of size 2-by-ZZ vector, resulting in Tq = time slots. 


. M = 3 

In this case, by the scheme proposed in G), all four users can get useful information through a single transmission from a 
single server. Thus, we cannot further reduce the delay by activating the other server. Thus, by activating just one server 
and based on 0 a coding delay of Tq = time slots is obtained. 

• M = 4 

In the case of M = 4, all four files can be stored in the cache of each user, and the required delivery delay in the second 
phase is zero Tc = 0 . 

Appendix C: Details of Example [4] (L = 3, N = 4, I< = 4) 

In this example, we consider the three server case in Example HI and for all values of M = 0,..., 4 present the schemes 
that lead to achievable rates. 

. M = 0 

In this case, we do not have any cache space available at the user locations. Suppose we divide each file into three 
equal-sized non-overlapping parts: 


A = 

[A 1 , A 2 , A 3 ] 

B = 

[B\B 2 ,B 3 ] 

C = 

[C' 1 ) C , 2 ,C 3 ] 

D = 

[D 1 ,D 2 , D 3 


send the following 3- 

-by -1 vectors 







X({1,2,3}) = 

„{!} 4l 

U {1,2,3}^ 

+ 

u {2} 

U {1,2,3} 

B 1 

+ 

u {3} 

U {1,2,3} 

c 1 

X({1,2,4}) = 

,,{!} A 2 

U {1,2,4} /1 

+ 

u {2} 

{1,2,4} 

B 2 

+ 

u {4} 

{1,2,4} 

D 1 

X({1,3,4}) = 

11 {1 > 4 3 

U {1,3,4} A 

+ 

u {3} 

{1,3,4} 

C 2 

+ 

u {4} 

{1,3,4} 

D 2 

X({2,3,4}) = 

,,{2} r>3 

U {2,3,4}-° 

+ 

u {3} 

{2,3,4} 

c 3 

+ 

U {4} 

{2,3,4} 

D 3 


where we require 


Ug _L h j , Vhj G S\T 
u£ / hj, Vhj e T. 


(52) 


(53) 


In this example, since we have three dimensional transmit vectors (three servers) and |£>\T| = 2, such vectors can be 
found. 


Let’s focus on the first user who receives: 

ln.XUl^S}) = (hi.uj^gj) A 1 
ht-X({l, 2,4}) = (hi.uJJ^) A 2 
ht-X({l, 3,4}) = (hi.ug^j) A 3 


B 1 
2 


( h l- U D } 2,3 } ) 
( h l' U {?2,4 } ) B 
( h l- U U } 3,4}) ° 2 


( h l- U U } 2,3}) C ' = ( h l' U {L2,3}) Al 
( h l- U U } 2,4}) Dl = ( h l- U {L2,4}) A2 
( h l- U U } 3,4}) ° 2 = ( h l' U {L3,4}) ^ 3 ' 


(54) 


The hrst user can then successfully decode its requested file. Similarly, the other users will also be able to decode their 
requested hies. 

The transmission stated in (l52t consists of four 3-by-T^- blocks, resulting in Tc = time slots. 

M= 1 

The cache content placement is the same as Q. Then, the transmit block by the three servers is: 


where (for u> = 1,2, 3) 

X„ ; = 


U {L2?3,4}- £ '{1 > 2}(^2, -Bl) + u {l’2f3,4}-^{l,3}(-^3! Cl) + U {l’2f3,4}C{l,4}(^4, -Dl) 

+ U {L23,4}C{2,3}(-®3, C 2 ) + u|j’2| 3 ) 4}C{2,4}(- B 4, D 2 ) + ^ 3 ). 

Now let’s focus on the hrst user who receives: 


X = [Xi.X.-.X, 


,{1,3} 


,{1,4} 


(55) 


(56) 


t/ 1 = h 1 .X = [h 1 .X 1 ,h 1 .X 2 ,h 1 .X 3 ]. 


(57) 


Let’s consider first the term: 
hfX, = (hr.u{ 

+ Lj 2j 3 } (B 3 , C 2 ) + 

= ( h l- U a2 } 3,4}) L {1,2}(^2 ,Si) + (hr.uj^^^) 


-^{1,3}(^3) Cl) + L\ 1>i} (A i ,D 1 ) 

^{2,4} (-®4, D 2 ) + (ht-uj^^ ^) i{ 3j 4}(C < 4, D 3 ) 
^{1,3}0^3) Cl) + L| 14 j(A 4 ,Di) 


As this user has cached Bi,Ci,D± in the first phase, it can remove these terms from this linear combination to obtain 

L(A 2 , A 3 , A±). 

Thus, user 1 can recover a linear combination of its requested sub-files from hi.Xi. From, I 11 .X 2 and I 11 .X 3 he can 
obtain two other independent linear combinations from which he can recover all three subfiles A 2 , A 3 , A 4 . Since he 
already has A\ in his cache, he can decode the whole A file. Similarly, all the other users can also decode their requested 
hies. 

The transmit block stated in (l55l > consists of one 3-by-|^ vectors, resulting in Tc = 7-7 time slots. 

M = 2 In this case, we only activate two of the servers and thus the problem reduces to the case with L -2, N = 4. K = 4 
for which we achieved Tc = h~- 

M = 3 In this case, we only activate one server and thus the problem reduces to J7| with Tc = 

M = 4 In this case we have Tc = 0. 


